Gaussian Models for the Statistical Thermodynamics of Liquid Water 
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A gaussian distribution of binding energies, but conditioned to exploit generally available infor- 
mation on packing in liquids, provides a statistical-thermodynamic theory of liquid water that is 
structurally non-committal, molecularly realistic, and surprisingly accurate. Neglect of fluctuation 
contributions to this gaussian model yields a mean-field theory that produces useless results. A 
refinement that accounts for sharper-than-gaussian behavior at high binding energies recognizes 
contributions from a discrete number of water molecules and permits a natural matching of numer- 
ically exact results. These gaussian models, which can be understood as vigorous simplifications 
of quasi-chemical theories, are applicable to aqueous environments where the utility of structural 
models based on geometrical considerations of water hydrogen bonding have not been established. 



A widely accepted molecular-scale understanding of 
liquid water under physiological conditions has evolved 
over recent decades based upon the concept of promis- 
cuous hydrogen-bonding that results in a thoroughly 
networked fluid 0. This view rests upon extensive 
molecular-scale simulation validated with traditional 
experimentation, and communicated with molecular- 
graphics tools. But water is a peculiar liquid. For ex- 
ample, the van der Waals theory Q, which provides the 
firmest basis for theories of simple liquids, is unsatis- 
factory for liquid water. Fig. ^ shows one experimental 
demonstration of that point. The foremost task in ap- 
plying molecular statistical mechanical theory to liquid 
water is to address the equation-of-state distinctions ex- 
emplified in Fig. n on the basis of realistic intermolecular 
interactions. One theoretical approach is to accept the 
voluminous data that can be generated in a typical, real- 
istic molecular simulation, but to craft a concise, quanti- 
tative statistical description of the basic thermodynamic 
characteristics 0, • As exemplified below, those statis- 
tical theories can be concise indeed, and general in scope. 

The focus here is analyzing the probability density dis- 
tribution, p(e), of binding energies, e, exhibited by a wa- 
ter molecule in liquid water. Thermodynamic properties 
are typically sensitive to the tails of this distribution, As 
a topical example, note that the population of weakly 
bound water molecules in liquid water can be decisive in 
filling transitions of carbon nanotubes Thus, it can 
be helpful to have a clear idea how those weakly bound 
populations can be analyzed, and a focus of this work is 
the analysis of p(e). 

Our motivation for the present analysis is the observa- 
tion of severely non-gaussian p(s) in cases where repul- 
sive interactions are prominent contributors Q. In such 
cases, conditioning to separate out effects of repulsive in- 
teractions was found to yield conditional distributions, 
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1: (du/dv) T for several solvents as a function of tem- 



FIG. 

perature along the vapor saturation curve. For van der Waals 
liquids (du/dv) T » ap 2 . Organic solvents conform to this ex- 
pectation, but water is qualitatively different. The data are 
from Ifjl. 



P (e\r > A), that were accurately gaussian. The idea is to 
account separately for close molecular encounters; then 
the direct statistical problem of evaluating the distribu- 
tion of binding energies need only consider the fraction of 
the sample for which the distance to the nearest solvent 
molecule center, r, is greater than the conditioning ra- 
dius A. That fraction is p (r > A), the marginal probabil- 
ity. In previous quasi-chemical treatments, the marginal 
probability p (r > A) was denoted by x 0, H S E3 . 

To follow that path, we seek /i ex , the chemical poten- 
tial of water in excess of the ideal contribution at the 
same density and temperature. On the basis of simula- 
tion data, we consider evaluating /u|jg, the excess chemi- 
cal potential of a hard sphere solute, relative to u ex . The 
potential distribution theorem (PDT) 0, EH El then 
yields 
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^ cx ) = p ( r > A) / p (e\r > A) e 0B ds 



(1) 



The thermodynamic temperature is T = 1/ksP where 
ks is the Boltzmann's constant. Since is known Q, 
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FIG. 2: Probability density p(e\r > A) of the binding en- 
ergy of a water molecule in liquid water at 298 K. A = 
0.0, . . . 0.300 nm, from top to bottom with successive results 
shifted incrementally shifted downward by 2 for clarity. The 
solid lines are the gaussian model for each data set. 



Eq. JIJ gives /i ox . We regard this conditioning as a regu- 
larization of the statistical problem embodied in Eq. iJTJl 
when A — > 0, which is practically impossible on the ba- 
sis of a direct, single calculation. After regularization, 
the statistical problem becomes merely difficult. A gaus- 
sian distribution model for p (e\r > A) should be accurate 
when A — ► oo, since then many solution elements will 
make small, weakly-correlated contributions to e. The 
marginal probability p (r > A) becomes increasingly dif- 
ficult to evaluate as A becomes large, however. For A 
on the order of molecular length scales typical of dense 
liquids, a simple gaussian model would accept some ap- 
proximation error as the price for manageable statistical 
error. If p (e\r > A) is modeled by a gaussian of mean 
(e\r > A) and variance (5e 2 \r > A), then 

// x - Mhs - kTlnp (r > A) - (e|r > A) = 

1 



2kT 



(Se 2 \r>\) . (2) 



This simple model motivates the following analyses. 

To test these ideas, simulation data for liquid water 
was generated at 298, 350, and 400 K and 1 bar using 
methods described in [9j. The hard-sphere excess chem- 
ical potential was obtained from Q. The distributions 
observed for T — 298 K are shown in FigEl Tabled col- 
lects the individual terms for the gaussian model, Eq. J3J, 
at each temperature. The observed dependence on A of 
the free energy at each temperature is shown in Fig|3 

Fig. shows that the unconditioned distribution p (e) 
displays positive skew, but the conditioning diminishes 
that skew perceptibly, as expected. p(e\r > A) is least 
skewed for the largest A, though the sample used is 
smaller by the fraction p (r > A), and thus less of the tail 
region is available for examination as A becomes larger. 



The conditioning affects both the high-e and low-e 
tails of these distributions. The mean binding energy 
(e\r > A) increases with increasing A [Table H|, so we 
conclude that the conditioning eliminates atypical low- 
e, well-bound configurations more than high-e configu- 
rations that reflect less favorable interactions. Never- 
theless, because of the exponential weighting of the in- 
tegrand of Eq. JIJ and because the variances are large, 
the high-e side of the distributions is overwhelmingly the 
more significant in this free energy prediction. 

Conversely, the fluctuation contribution exhibits a 
broad maximum for A < 0.29 nm, after which this con- 
tribution decreases steadily with increasing A [Table U] . 
Evidently water molecules closest to the distinguished 
molecule, i.e., those closer than the principal maximum 
of oxygen-oxygen radial distribution function, don't con- 
tribute importantly to the fluctuations. This is consistent 
with a quasi-chemical picture in which a water molecule 
and its nearest neighbors have a definite structural in- 
tegrity. 

The magnitude of the individual contributions to /i GX 
are of the same order as the net free energy; the mean 
binding energies are larger than that, as are the vari- 
ance contributions in some cases. The variance contribu- 
tions are about half as large as the mean binding ener- 
gies, with opposite sign. It is remarkable and significant, 
therefore, that the net free energies at 298 K are within 
roughly 12% of the numerically exact value computed 
by the histogram-overlap method. The discrepancies at 
the higher temperatures are larger, and we will return 
to that point. A mean- field-like approximation that ne- 
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FIG. 3: Dependence of the free energy // x predicted by the 
gaussian model on the conditioning radius A. The horizontal 
dotted lines are the numerically exact results. The error bars 
indicate approximate 95% confidence intervals. 
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TABLE I: Free energy contributions in kcal/mol associated with the gaussian model. The bottom value of the right-most 
column at each temperature gives the corresponding free energy evaluated by the histogram overlap method. 



T(K) 


A (nm) 


/xf s (A) +kTlnp (r > A) - 


f (e\r > A) + 


> A)/2fcT 






298 


0.2600 


2.80 


-0.04 


-19.74 


+9.87 


= 


-7.11+ 0.02 




0.2650 


2.99 


-0.13 


-19.68 


+9.93 


= 


-6.89+ 0.03 




0.2675 


3.09 


-0.20 


-19.59 


+9.97 


= 


-6.73+ 0.04 




0.2700 


3.19 


-0.31 


-19.46 


+9.98 


= 


-6.60+ 0.03 




0.2725 


3.29 


-0.44 


-19.27 


+9.97 




-6.45± 0.03 




0.2750 


3.40 


-0.60 


-19.03 


+9.92 


= 


-6.31+ 0.03 




0.2775 


3.50 


-0.78 


-18.73 


+9.83 


= 


-6.18+ 0.04 




0.2800 


3.61 


-0.98 


-18.39 


+9.71 


= 


-6.05+ 0.03 




0.2900 


4.07 


-1.96 


-16.75 


+8.89 


= 


-5.75+ 0.04 




0.3000 


4.56 


-3.09 


-14.93 


+7.77 


= 


-5.69+ 0.06 




0.3100 


5.05 


-4.27 


-13.21 


+6.67 


= 


-5.67± 0.18 




0.3200 


5.61 


-5.45 


-11.65 


+5.54 


= 


-5.95+ 0.37 




0.3300 


6.20 


-6.64 


-10.30 


+4.78 


= 


-5.96+ 0.97 
-6.49 


350 


0.2600 


3.12 


-0.05 


-18.43 


+9.44 


= 


-5.92+ 0.02 




0.2625 


3.23 


-0.09 


-18.41 


+9.46 




-5.81+ 0.02 




0.2700 


3.55 


-0.33 


-18.14 


+9.47 




-5.45+ 0.02 




0.2750 


3.77 


-0.62 


-17.73 


+9.35 


= 


-5.23+ 0.02 




0.2800 


4.00 


-0.99 


-17.15 


+9.10 


= 


-5.04+ 0.01 




0.2900 


4.50 


-1.92 


-15.67 


+8.24 


= 


-4.85+ 0.04 




0.3000 


5.02 


-3.00 


-14.05 


+7.20 


= 


-4.83+ 0.06 




0.3100 


5.58 


-4.13 


-12.48 


+6.14 


= 


-4.89+ 0.10 




0.3200 


6.18 


-5.28 


-11.01 


+5.24 


= 


-4.87+ 0.22 




0.3300 


6.80 


-6.41 


-9.74 


+4.47 


= 


-4.88+ 0.45 
-5.83 


400 


0.2600 


3.30 


-0.06 


-17.19 


+8.97 


= 


-4.98+ 0.02 




0.2625 


3.40 


-0.10 


-17.16 


+8.99 




-4.87+ 0.03 




0.2700 


3.74 


-0.35 


-16.89 


+8.95 




-4.55+ 0.02 




0.2750 


3.96 


-0.63 


-16.49 


+8.80 




-4.36+ 0.03 




0.2800 


4.20 


-0.98 


-15.96 


+8.52 




-4.22+ 0.03 




0.2900 


4.71 


-1.87 


-14.60 


+7.69 




-4.07+ 0.05 




0.3000 


5.25 


-2.89 


-13.10 


+6.72 




-4.02+ 0.05 




0.3100 


5.82 


-3.97 


-11.64 


+5.78 




-4.01+ 0.06 




0.3200 


6.42 


-5.06 


-10.31 


+4.95 




-4.00+ 0.14 




0.3300 


7.05 


-6.12 


-9.12 


+4.27 




-3.92+ 0.18 
-5.31 



gleets fluctuations produces useless results. 

We note that p(r > A) » 1 for the smallest values 
of A in Table |U This leads to the awkward point that 
if kThip{r > A) is zero, then the hard-sphere contribu- 
tion /xgg is ill-defined. As a general matter, the sum 
p!^ s + kT In p(r > A) cannot be identified as a hard-sphere 
contribution. Since these terms have opposite signs, the 
net value can be zero or negative, and those possibilities 
are realized [Table H|. To define the hard-sphere contri- 
bution more generally, we require /zgg to be continuous 
as A decreases, such that p(r > A) — > 1. All other terms 
of Eq. (J2J will be independent of A for values smaller 
than that, and we will require that of p,^ s also. 

From Fig. |3 we see that A > 0.30 nm clearly identifies 
a larger-size regime where the variation of the free en- 
ergy with A is not statistically significant. Although we 
anticipate a decay toward the numerically exact value 
for A — > oo, the statistical errors become unmanageable 
for values of A much larger than 0.30 nm. When A = 



0.30 nm a significant skew in p(e\r > A) is not observed, 
as already noted with Fig. [21 The predicted free en- 
ergy /i ex is then distinctly above the numerically exact 
value, suggesting that the gaussian model predicts too 
much weight in the high-e tail. We hypothesize that this 
sharper-than-gaussian tail behavior is due to the fact that 
a finite number of molecules make discrete contributions 
to the net e in this tail region. A model distribution 
exhibiting this distinction is 



p [e\r > A) 



f[ y 7r(e fe |r > A) I 5^e- f^e^j ds 



i . . . ds n , (3) 



with 7T (ek\r > A) an elementary distribution with a sharp 
cut-off. The plug density 



£k - 



(e\r > A) 



(4) 
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with H (x) the Heaviside function, is an example. This 
function is non-zero over an e-width of 

Ae = Vl2(fe 2 |r > A) /n (5) 

when the parameterization is set so that Eq. J3j is consis- 
tent with the previous notation. This leaves the number 
n as a single further parameter; a larger n indicates a 
larger range of gaussian behavior for p(e\r > A). With 
the example Eq. Q the evaluation of the thermodynamic 
result is elementary, and amounts to the replacement 

( fe V>A)Hfe>>A)(l- ^ (fe 2 aA> ) (•) 

in Eq. (J2J for large n. 

Since we exploit a single further datum, a single pa- 
rameter exhausts that information. When A = 0.30 nm, 
the values of n that fit the numerically exact results are 
21, 11, and 6 at T= 298, 350, and 400 K, respectively. 
These values are reasonable as indications of the discrete 
number of proximal water molecules that dominate sol- 
vent interactions with the distinguished molecule. 

It is essential, however, to recognize that the sharper- 
than-gaussian hypothesis means sharper than the sin- 
gle gaussian of Eq. J2J). A natural path for improve- 
ment of these results would be a multi-gaussian model, 
as was developed for careful evaluations of the electro- 
static contribution to the free energy of liquid water 
|l2t H5j . But some further points, which touch upon 
similarities of the multi-gaussian and quasi-chemical ap- 
proaches, must be kept in mind. First, we are concerned 
here with contributions from a defined outer-shell region, 
the direct inner-shell contributions have been relegated 
to the kTlnp (r > A) term of Eq. J5J). Second, the multi- 
gaussian approach requires choosing a variable to strat- 
ify the distribution; in the quasi-chemical approach that 
variable is the number of occupants of the defined shell 
|14| . presumably an inner outer-shell region here. Since 



this refinement has exhausted the data here, we don't 
pursue those more refined approaches. But we also recog- 
nize that the initial model, Eq. J2J, is a compact, simple 
implementation of quasi-chemical ideas. 

The values of n are found to correlate positively with 
the variance (fe 2 |r > A), such that Ae [Eq. (JSJ] is only 
weakly dependent on A. At T = 298 K, Ae ks 2 kcal/mol, 
independent of A. At the higher temperatures, Ae is 
1-2 kcal/mol larger, and has a noticeable, linear de- 
pendence on A. These empirical energy parameters are 
of reasonable magnitude by comparison with hydrogen- 
bond energies, and they do not correspond to weak in- 
teractions on a thermal scale. 

Though these theoretical developments were unantic- 
ipated, it is possible to make some connection to classi- 
cal theories. Assume that the iV-molecule fluid can be 
satisfactorily described by a pair-decomposable potential 
energy function. Then a gaussian model for a joint dis- 
tribution of binding energies E\ and e%, of molecules 1 
and 2, respectively, predicts 

/3 2 (& 1 fc 2 |l,2) = ln 2/ (l,2) , (7) 

where y (1, 2) is the two-molecule indirect distribution 
function (la ], and the average is conditional upon loca- 
tion of molecules at (1,2). A point of general interest is 
that Eq. J7J) is a signature of the random-phase family of 
approximations, e.g. the Debye-Hiickel theory. It is also 
interesting that Eq. Q does not express a conventional 
mean-field contribution. However, if the molecules con- 
sidered are significantly different, such a relation then 
is expected to resemble mean-field contributions of the 
classical type. 
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